Introduction

These days creation of a new package or an innovative approach to tackle certain problems in programming often comes with an appropriate article that enables the readers to understand authors’ concept. Outcomes of these articles are often reproduced by the readers but sometimes the readers encounter various problems and for some reason they cannot reproduce the authors’ results. In the following chapters four articles will be presented that were examined from the perspective of their reproducibility. As expected some of the code was recreated successfully but as well a few problems were encountered during the process.

1st article

Firstly, I’ll present an article that is fully reproducible. It was written last June (in 2019) by Maria Brigida Ferraro, Paolo Giordani and Alessio Serafini in order to explain how the considered package - fclust - can be used and was published in “The R Journal”. The title of the article “fclust: An R Package for Fuzzy Clustering” suggests that this package enables implementation of various clustering methods. As mentioned above, all of the code that was used in the article can be reproduced and was reproduced by me. Below all five figures included by the authors are presented and compared with the figures that I recreated.

## [1] "Figure 1: Scatterplot of the Butterfly data."

## [1] "Figure 2: Scatterplot of the NBA teams on the plane spanned by the first two principal components. Points are marked according to the obtained partition (Cluster 1: red, Cluster 2: cyan)."

## [1] "Figure 3: Scatterplot of the synt.data2 dataset."

## [1] "Figure 4: Scatterplot of relational data with plot method. Points are marked according to the obtained   classification (Cluster 1: red, Cluster 2: cyan)."

## [1] "Figure 5: Barplot of the 16 key votes for the two clusters (n: green, y: blue)."

To sum up, it was possible to fully reproduce the article “fclust: An R Package for Fuzzy Clustering”. Interestingly, in some cases colours of the clusters differ in comparison to the original case but nevertheless it can be said that the figure is reproducible.

2nd article

Similarly I managed to fully reproduce all the code included in the article “Graphs and Networks: Tools in Bioconductor” from December 2006. It was written by Li Long & Vince Carey and published in R News. This case was more interesting than the first article due to the fact that the article was created 14 years ago. Therefore the probability of some problems occurring because of incompatibility among packages was quite high. However such a situation did not happen although the packages needed to be installed in a different way. I’ll now present all the reproduced figures from the article and compare them with original graphs.

## [1] "Figure 1: Rendering of the IMCA pathway."

## [1] "Figure 2: Rendering with dot."

## [1] "Figure 3: Rendering with neato."

## [1] "Figure 4: Rendering with twopi."

Summing up, it can be stated that this old article was reproducible as well. Although some visual differences may be noticed, it is clear that they are rather slight and are not surprising considering that the article was published 14 years ago.

3rd article

The third article that I’ll focus on was written in year 2012 by Paul Murrell for “The R Journal”. It is called “What’s in a Name?” and stresses the importance of naming objects when preparing plots since this way they can be accessed, queried and modified in the future. The majority of the figures from the article could be easily reproduced (of course to some margin but since the text is from 2012 I’ll not focus on these small details) and I’ll therefore begin this chapter with presenting those graphs that belong to this group.

## [1] "Figure 1: Some simple shapes drawn with grid."

## [1] "Figure 2: The simple shapes from Figure 1 with the middle circle modified so that its background is      grey."

## [1] "Figure 4: A simple lattice scatterplot."

## [1] "Figure 5: The lattice plot from Figure 4 with the xaxis modified using low-level grid functions."

## [1] "Figure 6: A complex multipanel lattice barchart."

## [1] "Figure 7: The barchart from Figure 6 with the sixth set of bars in each panel highlighted."

## [1] "Figure 8: The lattice plot from Figure 4 with a rectangle added around the x-axis label."

## [1] "Figure 9: The lattice plot from Figure 4 with a rectangle added around the modified x-axis label."

## [1] "Figure 10: The lattice plot from Figure 4 with the xaxis label redacted (replaced with a black           rectangle)."

## [1] "Figure 11: The lattice plot from Figure 4 transformed into an SVG document with a hyperlink on the x-axis label."

## [1] "Figure 12: The simple shapes from Figure 1 with the text grob modified using a single cex value."

## [1] "Figure 13: The simple shapes from Figure 1 with the text grob modified using three distinct cex values."

In case of these figures the reproducibility although not perfect is definitely visible und unarguable. However, it can be noticed that from the figures above number 3 is missing. It is an interesting problem because in the article code to reproduce the figure was not included. In such a situation the fact that the figure is impossible to reproduce (without considerable amount of work) is indisputable. Therefore a question arises whether the whole article is reproducible. Probably there is something like reproducibility of an article expressed as a percentage?

4th article

Finally I’d like to spare a few minutes for an article that could not be reproduced by me to any extent. It was published in year 2011 by Frédéric Lafitte, Dirk Van Heule and Julien Van hamme and has title “Cryptographic Boolean Functions with R”. It is about a package called boolfun that enables R users handling of Boolean functions. However as I proceeded to reproduction of the code included in the article I encountered a serious obstacle. When I searched for the package on the CRAN repository I encountered such an information:

After deciding to try to somehow use this archived version I faced other challanges. I firstly learnt that 15 packages are not in appropriate version in order to work properly with the boolfun package and then I was told that “package ‘boolfun’ is not available (for R version 3.6.1). These information made me decide that the article”Cryptographic Boolean Functions with R" is no more reproducible.

Summary

From the considered articles 2 were fully reproducible, one was not possible to reproduce at all and one was possible to reproduce to some extent. Interestingly the articles were published in various years and only from these four texts some conclusions can be made. It appears that as the article gets older it is harder to reproduce it although it is not true for all of the articles. Of course it is not possible to fully justify such a statement after considering only 4 articles so it may an interesting concept to explore further.